Welcome back to Pattern Recognition. So today we want to explore a little bit more ideas about
the independent component analysis ICA and we've seen so far that the Gaussianity or non-Gaussianity
is an important property of independent components. So in today's video we want to look into different
measures how to actually determine this non-Gaussianity.
So we've seen that the key principle in estimating the independent components is
non-Gaussianity. In order to optimize the independent components we need a quantitative
measure of non-Gaussianity. So furthermore let's consider our random variable Y and assume that
it has zero mean and unit variance and of course we enforce this already by appropriate pre-processing.
Now we will consider three measures of non-Gaussianity and we'll look into the
kurtosis, the neck entropy and mutual information. Let's start with the kurtosis. The definition of
kurtosis is the expected value of Y to the power of 4 minus 3 times the square of the expected value
of the power of 2 of the original signal. If we have unit variance and zero mean you can see that
this simplifies to the signal to the power of 4 and then you subtract 3 because we simply have
the covariance matrix as the identity matrix and then this essentially gives us just a factor of 3.
Now if you have two independent random variables Y1 and Y2 then linearity properties hold which
means that the kurtosis of Y1 plus Y2 is going to be given as the kurtosis of Y1 and the kurtosis
of Y2 and also a scaling with a factor of alpha would then result in the kurtosis of Y multiplied
with the factor alpha to the power of 4 and now here alpha is a scalar value. Let's have a look
at the kurtosis for a Gaussian distribution. The nth central moment of a Gaussian distribution with
P of Y equals to the normal distribution with mean Y and variance sigma square can then be
determined as the expected value of Y minus mu to the power of n and you will see that this is
going to be n minus 1 double factorial times sigma to the power of n if n is even and zero if n is
odd. So for your zero mean and unit variance random variable Y that is normally distributed
we will have a kurtosis of zero. So the kurtosis is zero for a Gaussian random variable with zero
mean and unit covariance. For most but not all non-Gaussian random variables the kurtosis is
non-zero. So the kurtosis can also be positive or negative and typically then the non-Gaussianity
is measured as the absolute value of the kurtosis or the kurtosis to the power of 2. Let's look into
a sub-Gaussian probability density function. So here we choose the uniform distribution and you
can see that in this case the kurtosis is going to be negative. If we have a super Gaussian probability
density function for example we take the Laplacian distribution then you will see that the kurtosis
is greater than zero. If you consider the 2D case using a linear combination and then you can see
that we can express our Y as W transpose X. Now we replace X with the mixing matrix and the original
signals then we are able to rewrite the weighting vector as this inner product with Z and S and if
we have two variables we can write this as Z1 times S plus Z2 times S. Then the kurtosis of Y would
be given as the kurtosis of Z1 times S1 plus the kurtosis of Z2 times S2 and this can be rewritten
using our scalar property as Z1 to the power of 4 times the kurtosis of S1 plus Z2 to the power of 4
times the kurtosis of S2. So as Y also has a unit variance concerning S1 and S2 we can now write up
the expected value of Y square and you see that this is going to be given as Z1 to the power of 2
plus Z2 to the power of 2 and this is supposed to be 1 because of our scaling. So this constrains
our Z to the unit circle in the 2D plane. Now we have to find the maximum of the function on the
unit circle with respect to Z. So the absolute value of the kurtosis is given by the absolute
value of the reformulated kurtosis with respect to the two signals and here we have a couple of
examples for the landscape of the kurtosis in a 2D plane. So here the thick curve is the unit circle
and then the thin curves are isocontourist of the objective function. So you see that the maxima are
located at sparse values of Z and for example you find them where Y is plus minus SI. So how would
we maximize the non-gaussianity of a vector W in practice? You start with some initial vector W,
then you use gradient descent to optimize the maximization of the absolute value of the kurtosis
and of course you want to do that after transforming with W. So you plug this in here. Then you plug
this optimization into the ICA estimation algorithm that we've seen in the previous video. Let's
visualize the kurtosis as a function of the direction of the projection and here you can
Presenters
Zugänglich über
Offener Zugang
Dauer
00:15:14 Min
Aufnahmedatum
2020-11-16
Hochgeladen am
2020-11-16 08:09:03
Sprache
en-US
In this video, we discuss three measures to determine "non-gaussianity".
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You